A Walk with SGD
نویسندگان
چکیده
Exploring why stochastic gradient descent (SGD) based optimization methods train deep neural networks (DNNs) that generalize well has become an active area of research. Towards this end, we empirically study the dynamics of SGD when training over-parametrized DNNs. Specifically we study the DNN loss surface along the trajectory of SGD by interpolating the loss surface between parameters from consecutive iterations and tracking various metrics during training. We find that the loss interpolation between parameters before and after a training update is roughly convex with a minimum (valley floor) in between for most of the training. Based on this and other metrics, we deduce that during most of the training, SGD explores regions in a valley by bouncing off valley walls at a height above the valley floor. This ’bouncing off walls at a height’ mechanism helps SGD traverse larger distance for small batch sizes and large learning rates which we find play qualitatively different roles in the dynamics. While a large learning rate maintains a large height from the valley floor, a small batch size injects noise facilitating exploration. We find this mechanism is crucial for generalization because the valley floor has barriers and this exploration above the valley floor allows SGD to quickly travel far away from the initialization point (without being affected by barriers) and find flatter regions, corresponding to better generalization.
منابع مشابه
Robust Decentralized Differentially Private Stochastic Gradient Descent
Stochastic gradient descent (SGD) is one of the most applied machine learning algorithms in unreliable large-scale decentralized environments. In this type of environment data privacy is a fundamental concern. The most popular way to investigate this topic is based on the framework of differential privacy. However, many important implementation details and the performance of differentially priv...
متن کاملFinite-Time Analysis of Projected Langevin Monte Carlo
We analyze the projected Langevin Monte Carlo (LMC) algorithm, a close cousin of projected Stochastic Gradient Descent (SGD). We show that LMC allows to sample in polynomial time from a posterior distribution restricted to a convex body and with concave log-likelihood. This gives the first Markov chain to sample from a log-concave distribution with a first-order oracle, as the existing chains w...
متن کاملProtective effect of Viola tricolor and Viola odorata extracts on serum/glucose deprivation-induced neurotoxicity: role of reactive oxygen species
Objective: Oxidative stress plays a key role in the pathophysiology of brain ischemia and neurodegenerative disorders.Previous studies indicated that Viola tricolor and Viola odorataare rich sources of antioxidants. This study aimed to determine whether these plants protect neurons against serum/glucose deprivation (SGD)-induced cell death in an in vitro model of ischemia and neurodegeneration....
متن کاملThe mechanism of neuroprotective effect of Viola odorata against serum/glucose deprivation-induced PC12 cell death
Objective: Oxidative stress is associated with the pathogenesis of brain ischemia and other neurodegenerative disorders. Previous researches have shown the antioxidant activity of Viola odorata L. In this project, we studied neuro-protective and reactive oxygen species (ROS) scavenging activities of methanol (MeOH) extract and other fractions isolated from <e...
متن کاملWeighted parallel SGD for distributed unbalanced-workload training system
Stochastic gradient descent (SGD) is a popular stochastic optimization method in machine learning. Traditional parallel SGD algorithms, e.g., SimuParallel SGD [1], often require all nodes to have the same performance or to consume equal quantities of data. However, these requirements are difficult to satisfy when the parallel SGD algorithms run in a heterogeneous computing environment; low-perf...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1802.08770 شماره
صفحات -
تاریخ انتشار 2018